An Efficient Algorithm To Induce Minimum Average Lookahead Grammars For Incremental LR Parsing

نویسندگان

Dekai Wu

Yihai Shen

چکیده

We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a nondeterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR (k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR (k) grammars, since k is not specified in advance. Clearly, naı̈ve approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3 + 2m) greedy approximation algorithm for this task that is quite efficient in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Symbolic Lookaheads for Bottom-up Parsing

We present algorithms for the construction of LALR(1) parsing tables, and of LR(1) parsing tables of reduced size. We first define specialized characteristic automata whose states are parametric w.r.t. variables symbolically representing lookahead-sets. The propagation flow of lookaheads is kept in the form of a system of recursive equations, which is resolved to obtain the concrete LALR(1) tab...

متن کامل

Incremental Parser Generation for Tree Adjoining Grammars

This paper describes the incremental generation of parse tables for the LRtype parsing of Tree Adjoining Languages (TALs). The algorithm presented handles modi cations to the input grammar by updating the parser generated so far. In this paper, a lazy generation of LR-type parsers for TALs is de ned in which parse tables are created by need while parsing. We then describe an incremental parser ...

متن کامل

Deterministic Left to Right Parsing of Tree Adjoining Languages

We define a set of deterministic bottom-up left to right parsers which analyze a subset of Tree Adjoining Languages. The LR parsing strategy for Context Free Grammars is extended to Tree Adjoining Grammars (TAGs). We use a machine, called Bottom-up Embedtied Push Down Automaton (BEPDA), that recognizes in a bottom-up fashion the set of Tree Adjoining Languages (and exactly this se0. Each parser...

متن کامل

An Efficient Augmented-Context-Free Parsing Algorithm

An efficient parsing algorithm for augmented context-free grammars is introduced, and its application to on-line natural language interfaces discussed. The algorithm is a generalized LR parsing algorithm, which precomputes an LR shift-reduce parsing table (possibly with multiple entries) from a given augmented context-free grammar. Unlike the standard LR parsing algorithm, it can handle arbitra...

متن کامل

Construction of Efficient Generalized LR Parsers

We show how LR parsers for the analysis of arbitrary contextfree grammars can be derived from classical Earley’s parsing algorithm. The result is a Generalized LR parsing algorithm working at complexity O(n) in the worst case, which is achieved by the use of dynamic programming to represent the non-deterministic evolution of the stack instead of graph-structured stack representations, as has of...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

An Efficient Algorithm To Induce Minimum Average Lookahead Grammars For Incremental LR Parsing

نویسندگان

چکیده

منابع مشابه

Symbolic Lookaheads for Bottom-up Parsing

Incremental Parser Generation for Tree Adjoining Grammars

Deterministic Left to Right Parsing of Tree Adjoining Languages

An Efficient Augmented-Context-Free Parsing Algorithm

Construction of Efficient Generalized LR Parsers

عنوان ژورنال:

اشتراک گذاری